在过去的几年中,自动驾驶的感知系统在其表现方面取得了重大进步。但是,这些系统在极端天气条件下努力表现出稳健性,因为在这些条件下,传感器和相机等传感器套件中的主要传感器都会下降。为了解决此问题,摄像机雷达融合系统为所有可靠的高质量感知提供了独特的机会。相机提供丰富的语义信息,而雷达可以通过遮挡和在所有天气条件下工作。在这项工作中,我们表明,当摄像机输入降解时,最新的融合方法的性能很差,这实际上导致失去了他们设定的全天可靠性。与这些方法相反,我们提出了一种新方法RadSegnet,该方法使用了独立信息提取的新设计理念,并在所有情况下都可以在所有情况下真正实现可靠性,包括遮挡和不利天气。我们在基准ASTYX数据集上开发并验证了我们的系统,并在辐射数据集上进一步验证了这些结果。与最先进的方法相比,Radsegnet在ASTYX上提高了27%,辐射增长了41.46%,平均精度得分,并且在不利天气条件下的性能明显更好
translated by 谷歌翻译
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-200%, 8-40%, and 80-290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively.
translated by 谷歌翻译
There is a vast amount of data generated every second due to the rapidly growing technology in the current world. This area of research attempts to determine the feelings or opinions of people on social media posts. The dataset we used was a multi-source dataset from the comment section of various social networking sites like Twitter, Reddit, etc. Natural Language Processing Techniques were employed to perform sentiment analysis on the obtained dataset. In this paper, we provide a comparative analysis using techniques of lexicon-based, machine learning and deep learning approaches. The Machine Learning algorithm used in this work is Naive Bayes, the Lexicon-based approach used in this work is TextBlob, and the deep-learning algorithm used in this work is LSTM.
translated by 谷歌翻译
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.
translated by 谷歌翻译
With the advancement in computing and robotics, it is necessary to develop fluent and intuitive methods for interacting with digital systems, augmented/virtual reality (AR/VR) interfaces, and physical robotic systems. Hand motion recognition is widely used to enable these interactions. Hand configuration classification and MCP joint angle detection is important for a comprehensive reconstruction of hand motion. sEMG and other technologies have been used for the detection of hand motions. Forearm ultrasound images provide a musculoskeletal visualization that can be used to understand hand motion. Recent work has shown that these ultrasound images can be classified using machine learning to estimate discrete hand configurations. Estimating both hand configuration and MCP joint angles based on forearm ultrasound has not been addressed in the literature. In this paper, we propose a CNN based deep learning pipeline for predicting the MCP joint angles. The results for the hand configuration classification were compared by using different machine learning algorithms. SVC with different kernels, MLP, and the proposed CNN have been used to classify the ultrasound images into 11 hand configurations based on activities of daily living. Forearm ultrasound images were acquired from 6 subjects instructed to move their hands according to predefined hand configurations. Motion capture data was acquired to get the finger angles corresponding to the hand movements at different speeds. Average classification accuracy of 82.7% for the proposed CNN and over 80% for SVC for different kernels was observed on a subset of the dataset. An average RMSE of 7.35 degrees was obtained between the predicted and the true MCP joint angles. A low latency (6.25 - 9.1 Hz) pipeline has been proposed for estimating both MCP joint angles and hand configuration aimed at real-time control of human-machine interfaces.
translated by 谷歌翻译
文本分割旨在将文本分为连续的语义连贯段,而段标签则与每个段的生成标签有关。过去的工作表明,在解决文档和对话的分段和标签方面取得了成功。通过特定于任务的管道,受监督和无监督的学习目标的结合,这是可能的。在这项工作中,我们提出了一个单一的编码器神经网络,该网络可以处理长文档和对话,同时仅使用标准监督进行细分和细分标记。我们成功地展示了将组合任务作为纯生成任务解决的方法,我们称之为结构化摘要。我们将相同的技术应用于文档和对话数据,并在高资产设置和低资源设置下显示了各个数据集的最新技术性能。我们的结果确定了一个有力的案例,可以考虑整体文本细分和细分标签,并朝着不依赖域专业知识或特定于任务的组件的通用技术迈进。
translated by 谷歌翻译
本文提议使用修改的完全连接层转移初始化,以进行1900诊断。卷积神经网络(CNN)在图像分类中取得了显着的结果。但是,由于图像识别应用程序的复杂性,培训高性能模型是一个非常复杂且耗时的过程。另一方面,转移学习是一种相对较新的学习方法,已在许多领域使用,以减少计算来实现良好的性能。在这项研究中,Pytorch预训练的模型(VGG19 \ _bn和WideresNet -101)首次在MNIST数据集中应用于初始化,并具有修改的完全连接的层。先前在Imagenet中对使用的Pytorch预培训模型进行了培训。提出的模型在Kaggle笔记本电脑中得到了开发和验证,并且在网络培训过程中没有花费巨大的计算时间,达到了99.77%的出色精度。我们还将相同的方法应用于SIIM-FISABIO-RSNA COVID-19检测数据集,并达到80.01%的精度。相比之下,以前的方法在训练过程中需要大量的压缩时间才能达到高性能模型。代码可在以下链接上找到:github.com/dipuk0506/spinalnet
translated by 谷歌翻译
全向视频中的光流估计面临两个重要问题:缺乏基准数据集以及调整基于视频的方法以适应全向性质的挑战。本文提出了第一个具有360度视野Flow360的感知上天然合成的全向基准数据集,其中有40个不同的视频和4,000个视频帧。我们在数据集和现有的光流数据集之间进行了全面的特征分析和比较,这些数据集表现出感知现实主义,独特性和多样性。为了适应全向性质,我们提出了一个新颖的暹罗表示学习框架(SLOF)。我们以对比度的方式训练我们的网络,并结合了对比度损失和光流损失的混合损失函数。广泛的实验验证了所提出的框架的有效性,并在最新方法中显示出40%的性能提高。我们的Flow360数据集和代码可在https://siamlof.github.io/上找到。
translated by 谷歌翻译
最近成功的对抗性攻击面部识别表明,尽管面部识别模型取得了显着进展,但它们仍然远远落后于人类的感知和认可。它揭示了深度卷积神经网络(CNN)作为面部识别模型的最先进的构建模型的脆弱性,这可能会对安全系统造成一定的后果。以前对基于梯度的对抗攻击进行了广泛的研究,并证明对面部识别模型取得了成功。但是,找到每张面部的优化扰动需要将大量查询数量提交目标模型。在本文中,我们提出了使用自动面部扭曲对面部识别的递归对抗攻击,该面部扭曲需要极有限的查询才能欺骗目标模型。翘曲功能不是随机的面部翘曲程序,而是应用于眉毛,鼻子,嘴唇等的特定脸部检测区域。我们在基于决策的黑盒攻击环境中评估了拟议方法的鲁棒性,攻击者有攻击者有无法访问模型参数和梯度,但是目标模型提供了硬标签的预测和置信分数。
translated by 谷歌翻译
提供有关学习者论证的反馈对于发展批判性思维技能至关重要,但是,它需要大量的时间和精力。为了减轻教师的过载,我们旨在自动化提供反馈的过程,尤其是给出诊断评论,以指出论点固有的弱点。建议给出特定的诊断评论,以便学习者可以识别诊断而不会误解。但是,如何制定提供特定的诊断评论的任务并不明显。我们将任务的表述作为模板选择和插槽填充,以使自动评估变得更加容易,并且模型的行为更加可行。该公式的关键是创建足以实用的模板集的可能性。在本文中,我们定义了三个标准,即模板集应满足:表达性,信息性和唯一性,并验证创建一个满足这些标准作为第一个试验的模板集的可行性。我们将通过一项注释研究证明,将文本中给出的诊断评论转换为模板格式是可行的。注释研究中使用的语料库公开可用。
translated by 谷歌翻译